A method of discovering specific functional antibodies

ABSTRACT

The invention relates to a method for discovering specific functional antibodies, in particular to a method for discovering specific functional antibodies based on sequence composition and frequency analysis of variable regions of immune host antibodies. Compared with conventional high-throughput antibody sequencing method, this method can quickly and accurately obtain the full-length sequence of the variable region of antibody containing candidate CDR3, the success rate of antibody gene pairing is high and it is suitable for the detection of few samples, and improves the efficiency of obtaining specific functional antibodies.

TECHNICAL FIELD OF THE INVENTION

The invention relates to the biotechnology field, in particular to amethod of discovering specific functional antibodies, in particular to amethod of discovering specific functional antibodies based on thecomposition and frequency analysis of variable regions of immune hostantibodies.

BACKGROUND TECHNOLOGY

A completely identical antibody produced by a single B-cell and onlyagainst a specific antigen epitope is called a monoclonal antibody.Since Kohler and Milstein reported that hybridoma techniques were usedto obtain monoclonal antibodies against sheep erythrocytes in 1975, moreand more monoclonal antibodies have been widely used in the fields ofdetection of medical diagnosis and separation of biomolecules andothers. Because of their strong specificty, high purity and gooduniformity, monoclonal antibodies have improved the efficiency andaccuracy of medical testing. In recent years, with the development oftumor immunotherapy, monoclonal antibodies against immune checkpointshow a great prospect in tumor immunotherapy.

In addition to hybridoma techniques, many techniques have been appliedto the production of monoclonal antibodies, such as vitro displaytechniques represented by phage display, single cell cloning techniques,EB virus-mediated B cell immortalization techniques, protein spectrumanalysis combined with DNA high-throughput sequencing techniques etc.These techniques have their own strong points and weakness, for example,hybridoma techniques are limited to a limited variety such as mice orrabbits, and the subsequent isolation of monoclonal hybridoma istime-consuming and huge work. The production of monoclonal antibodies isunstable and easy to be lost. The vitro display techniques also requirea long period and the generated antibodies are the same as hybridomaantibodies which need to be humanized.

The method of antibody discovery based on high-throughput sequencingdeveloped in recent years reveals the variation of specific antibodyabundance in antibody gene spectrum under specific immune status bycomparing the frequency of CDR3 sequences in host antibody gene spectrumbefore and after immunization. The full-length sequence of CDR3 sequencewith largest frequency variation is selected for functional validationin vitro pairing expression as a candidate sequence of specificantibodies. This method is not limited to species. A large number ofcandidate pairs can be generated from the antibody gene spectrum in ashort period for functional verification. It is possible to obtainantibodies with high affinity.

The characteristics of second generation sequencing, such as readinglength, accuracy, throughput, cost performance and son on, are suitablefor studying the diversity of antibody library composition, but thereare some technical difficulties in antibody sequencing libraryconstruction, high throughput antibody sequencing and data analysis etc.As a result, the obtained antibody gene spectrum information isinaccurate, the contained antibody gene information is biased, it isdifficult to accurately define the variation of CDR3 frequency, and theverification efficiency of vitro pairing function of the selectedfull-length genes is low.

Second generation sequencing has been widely used to study the diversityof the composition of antibody gene spectrum. However, how to avoid theintroduction of biases in the sequencing library construction is alwaysa difficult problem. As present, RT or RACE methods are frequently usedto construct high-throughput antibody sequencing library. RT methoddesigns primers and amplifies the variable regions of antibody accordingto the conserved sequences of known variable regions of antibodysequences. Because of the very high sequence diversity of antibodygenes, multiple primers or merger primers are frequently used to amplifythe multiple PCR in the real operation, in order to cover the antibodygenes to the maximum extent. However, multiple primer amplification,sample RNA degradation, incomplete database and so on inevitablyintroduce biases in library construction and data analysis. As a result,the results of sequencing cannot truly reflect the composition andchanges of the variable regions of antibody. Another difficulty inconstructing RACE library is that the requirements of quality andquantity of sample RNA are high. However, it is difficult to obtainenough samples in practical work; in particular, the number ofclinicopathological samples is often difficult to meet the needs of RACElibrary construction.

SUMMARY

According to the shortcomings of the prior arts and the actualrequirements, the invention provides a method of discovering specificfunctional antibodies. The method can roundly study the composition anddistribution of the variable regions of multi-species antibody geneswithout bias.

For this purpose, the invention adopts the following technical scheme:

The invention provides a method of discovering specific functionalantibodies, including the following steps:

(1) Extract the total RNAs of at least one target of immunize hostsubjects and construct a high-throughput antibody sequencing library;

(2) The high-throughput sequencing library constructed by step (1) isused for high-throughput sequencing of the variable regions of theimmunoglobulin gene, and the gene spectrum of variable regions ofantibody for the at least one target anitgen is obtained.

(3) Candidate CDR3 and corresponding heavy chain and light chainantibody nucleic acid sequences are selected as candidate pairingsequences from the variable region gene spectrum of antibody.

(4) The light chain and heavy chain genes are selected to pairwiseexpress in vitro to produce candidate recombinant antibodies.

Among them, the selection of candidate CDR3 from the gene spectrum ofthe variable regions of antibody includes the analysis of the results ofhigh-throughput sequencing of Read1 and/or Read2 separately, selectcandidate CDR3 homologous clusters, and combine with the results ofRead1 and Read2 splicing, the full-length gene of the variable regionsof antibody containing this CDR3 homologous cluster is identified.

In the invention, the result of Read1 and/or Read2 and the result ofRead1 and Read2 splicing are analyzed.

As the preferably technical scheme, the method of constructinghigh-throughput antibody sequencing library described in step (1) isRACE library construction and/or RT library construction, such as singleRACE library construction single RT library construction or both RACElibrary construction and RT library construction at the same time.

Preferably, the RACE library construction includes the following steps:

(a) Obtain the cells of the subjects and isolate the total RNA;

(b) Use oligo (dT) as primer and total RNA in step (a) as template,synthesize cDNAs by reverse transcription.

(c) Use cDNA produces in step (b) as template, the high-throughputantibody sequencing library is constructed by using the amplicon libraryconstruction method after using the two-step PCR amplification method orthe first step PCR method to amplify the antibody genes.

The invention constructs the RACE libraries of the heavy chain, Kappachain and Lambda chain respectively, uses the Illumina system tosequence the library, to sequence the heavy chain library separately,and to combine then sequence or separately sequence the Kappa and Lambdachain libraries.

In the invention, the optimized two-step PCR amplification is used toconstruct the library, which can reduce the RNA input volume of thesample to 20 ng, simplify the construction process of the library, andmaintain the coverage degree and accuracy of the obtained gene spectrumof the variable regions of antibody.

Preferably, the RT library construction includes the following steps:

(a′) Obtain the cells of the subjects and isolate the total RNA;

(b′) Use oligo(dT) as primer and total RNA in step (a′) as template,synthesize cDNAs by reverse transcription.

(c′) Use cDNA produced in step (b′) as template, specific primers areused to amplify the antibody genes, and then the PCR amplified productsare constructed DNA library.

In the invention, the PCR amplified program in the RACE libraryconstruction and RT library constructions is as follows:

-   -   {circle around (1)} 95°C. 2 min;    -   {circle around (2)} 95°C. 30 sec, Tm 30 sec, 72°C. 30 sec; 15-35        cycles    -   {circle around (3)} 72°C. 7 min;    -   {circle around (4)} 4°C. store

Preferably, the subjects described in the step (a) of the RACE libraryconstruction should be any or at least two combinations of mammals,amphibians, fish or birds;

Preferably, the mammals should be any or at least two combinations ofhumans, mice, primates, rabbits, amphibians, fish or birds.

Preferably, the cells described in the step (a) of the RACE libraryconstruction should be derived from any or at least two combinations ofthe peripheral blood, lymphoid organs, spleen, bone marrow or liver ofthe subjects;

Preferably, the cells described in the step (a) of the RACE libraryconstruction should include any or at least two combinations of memory Bcells, plasma cells, or plasmablast.

Preferably, the primers of PCR used in the first round of two-step PCRamplification described in step (c) of RACE library construction includeforward primer and reverse primer. The forward primer contains partialforward joint sequence and sequence as shown by SEQ ID NO.1. The reverseprimer contains a partial reverse joint sequence and a sequence as shownin one of the SEQ ID NO.2-4;

SEQ ID NO. 1: AAGCAGTGGTATCAACGCAGAGTA; SEQ ID NO. 2:GGAAGACCGATGGGCCCTTGGTGG; SEQ ID NO. 3: GCAGGCACACAACAGAGGCAGTTCCAG;SEQ ID NO. 4: CACACCAGTGTGGCCTTGTTGGCTT

The forward primers used in the first round of PCR in the two-step PCRamplification described in the step (c) of the RACE library constructionare as follows:

Partial joint sequence —AAGCAGTGGTATCAACGCAGAGTA (The forward primer isfrom Clontech SMARTer RACE 5′/3′ Kit).

As an example, the reverse primers used in the first round of PCR in thetwo-step PCR amplification described in the step (c) of the RACE methodcan adopt below three modes:

Partial joint sequence —GGAAGACCGATGGGCCCTTGGTGG (heavy chain IgGreverse primer); Partial joint sequence —GCAGGCACACAACAGAGGCAGTTCCAG(kappa reverse primer); Partial reverse joint sequence—CACACCAGTGTGGCCTTGTTGGCTT (lambda reverse primer)

Preferably, the partial forward joint sequence includes 5-60 bp of the3′ end of the Illumina forward joint primer, and the partial reversejoint sequence includes 5-60 bp of the 3′ end of the Illumina reversejoint primer;

Preferably, the sequence of the Illumina forward joint primer is shownby SEQ ID NO.5, and the Illumina reverse joint primer is shown by SEQ IDNO.6.

SEQ ID NO. 5: AATGATACGGCGACCACCGAGATCT ACACTATAGCCTACACTCTTTCCCTACACGACGCTCTTCCGATCT; SEQ ID NO. 6: CAAGCAGAAGACGGCATACGAGATAGCTTCAGGTGACTGGAGTTCAGACGTGT GCTCTTCCGATCT.

Preferably, the primers used in the second round of PCR in the two-stepPCR amplification described in the step (c) of the RACE libraryconstruction are conventional Illumina library construction primers.

Preferably, the high-throughput sequencing is composed by any or atleast two combinations of sequencing by synthesis, sequencing byjoining, sequencing by hybridization, single molecule DNA sequencing,multiple polymerase community sequencing or nano-pore sequencing, thenselected as sequencing by synthesis, and further selected as Illuminaplatform sequencing.

In the invention, using Illumina Miseq 2x300 sequencing system, althoughthe antibody variable region gene sequencing is long, it can basicallyreach the upper limit of Illumina sequencing, but compared with otherhigh-throughput sequencing methods such as Roche 454, Illumina Miseq2x300 system is with high accuracy, high-throughput and low cost;compared with the third generation sequencing method, although it doesnot have the strong point in reading long, but the strong points ofIllumina system in throughput and sequencing cost are obviously.

In the invention, the analysis method adopts the method of two-endsequencing and then selecting, which makes up for the deficiency of thereading length of Illumina sequencing, and the full-length sequence canbe obtained more accurately by two-end sequencing.

In the invention, it is difficult for the existing Illumina Miseq 2x300system to ensure the full length of the antibody variable region. Anoptimized selected method of full length of the antibody variable regionis developed, and the full-length sequence of the antibody variableregion is obtained by combining the results of RACE and RT libraryconstruction and sequencing, and the bias due to incomplete data can bereduces.

Preferably, the principle of selecting candidate CDR3 is: any or atleast two combinations of high frequency sequence after CDR3 clustering,selection CDR3 sequences with significantly higher frequency afterimmunization or outbreak phase than in pre-immunization or convalescencephase, or the sequence of V gene family corresponding to CDR3 afterimmunization or outbreak phase significantly different from that in thepre-immunization or convalescence phase.

Preferably, the selection of candidate CDR3 in the step (3) andcorresponding heavy chain and light antibody nucleic acid sequences ascandidate pairing sequence includes the following steps:

(1′) Select candidate CDR3 homologous clusters;

(2′) Anchor the CDR3 homology cluster, and the full length amino acidsequences of the high frequency antibody heavy chain and the light chainvariable region containing the CDR3 homology cluster are selected as thefirst and second pairing candidate sequences in the gene spectrum of thevariable region of antibody;

(3) Determine nucleic acid sequences of first and second pairingcandidate sequences; Preferably, the selectin in the step (2′) of thefull-length amino acid sequences of the high-frequency antibody heavychain and the light chain variable region containing this candidate CDR3homologue cluster includes the following steps: all the Read1 and Read2of the CDR3 homologous cluster are selected from the gene spectrum ofantibody variable region to compare the antibody database and get themost frequent amino acid sequence in CDR region and FR region; at thesame time, the sequence results Read1 and Read2 are spliced, and all thespliced sequences are compared with the antibody database to determinethe highest frequency amino acid sequences in the CDR region and FRregion; compare and combine the amino acid sequences of CDR region andFR region; compare and combine the amino acid sequences of CDR regionand FR region obtained from two sources, obtain the full-length aminoacid sequence of the variable region of antibody corresponding to theCDR3 homology cluster as the candidate pairing sequence, and select thecorresponding full-length nucleotide sequences of the highest frequencyvariable region in the Read1 and Read2 spliced sequences.

Preferably, the full-length nucleotide sequence of the variable regiondescribed in step (3′) is compared with the sequence of the highestfrequency antibody variable region containing specific CDR3 homologycluster obtained by RT library construction, and the full-lengthsequence of the variable region of antibody is obtained as a candidatepairing sequence.

In the invention, for the partial antibody variable region, due to theinclusion of a longer CDR region, FR region or a longer 5′UTR region,the splicing rate is very low, so it is difficult to obtain the fulllength sequence of the high frequency variable region by the method ofsequentially selection with the locking of CDR3. The invention adopts RTmethod to construct library at the same time to make up for the problemof low splicing rate of RATE library.

In the invention, the method of obtaining full-length antibody bycombining RACE library sequencing and RT library sequencing is asfollows: parallel use RACE library construction sequencing and RTlibrary construction sequencing for the same sample, compare theconsistency of high-frequency CDR3 in the two libraries, when achieve acertain standard under the consistency, lock CDR3 and select the highestfrequency DNA sequence containing the CDR3 in the RT library.

Preferably, the method also includes the identification of selectedantibodies.

Preferably, the identification includes the identification of theobtained antibody with at least one target antigen;

Preferably, the identification steps include scFv or Fab fragments orIgG of the antibody obtained by expressing in vitro, and the resultantscFv, Fab fragments or IgG with the target antigen;

Preferably, the vitro expression method is to express scFv or Fabfragments through the display method of prokaryotic cells, phages oryeast systems, or to express Fab or IgG through the exogenous gene ofmammalian cells;

Compared with the prior arts, the invention has the following beneficialeffects:

(1) The analysis method of the invention adopts the method of two-endsequencing and splicing then selecting, which makes up for thedeficiency of the reading length of the Illumina sequencing, and thefull-length sequence can be obtained more accurately by the two-endsequencing;

(2) The invention adopts the optimized two-step PCR amplificationlibrary construction, which can reduce RNA input amount of the sample to20 ng, simplify the construction process of the library, and maintainthe coverage degree and accuracy of the acquired variable region genespectrum of the antibody.

(3) The invention aims at the existing Illumina Miseq 2x300 system whichis difficult to obtain the full length of the variable region ofantibody, develops an optimized method of selectin full length of thevariable region of antibody, develops an optimized method of selectingthe full length of the variable region of antibody, obtain thefull-length sequence of the variable region of antibody by combiningwith the results of RACE and RT library construction sequencing, andreduces bias due to incomplete data;

(4) The discovery method for specific antibodies in the invention issimple, fast and low cost, can detect micro samples, and makes afoundation for the discovery of a large number of antibodies.

BRIEF DESCRIPTION OF THE DRAWINGS

FIG. 1 is a schematic diagram of the optimized discovery method forantibodies in the invention.

FIG. 2 is a schematic diagram of concrete steps of the embodiment 1 ofthe invention.

FIG. 3 is the correlation analysis of antibody gene CDR3 frequency ofdifferent RNA inputs: FIG. 3(a) is the comparison of heavy-chain CDR3when RNA input is 600 ng and 20 ng; FIG. 3(b) is the comparison of heavychain CDR3 when RNA input is 600 ng and 100 ng; FIG. 3(c) is thecomparison of kappa CDR3 when RNA input is 600 ng and 20 ng; FIG. 3(d)is the comparison of kappa CDR3when RNA input is 600 ng and 100 ng; FIG.3(e) is the comparison of lambda CDR3 when RNA input is 600 ng and 20ng; FIG. 3(f) is the comparison of lambda CDR3when RNA input is 600 ngand 100 ng.

FIG. 4 is a pairing expression result of a candidate dengue virusantibody of the embodiment 2 in the invention.

FIG. 5 is a pairing expression result of a candidate flu virus antibodyof the embodiment 3 in the invention;

FIG. 6 is the correlation analysis of CDR3 frequency of RT and RACElibraries: FIG. 6(a) is the correlation analysis of the CDR3 frequencyof heavy chain in RT and RACE libraries; FIG. 6(b) is the correlationanalysis of CDR3frequency Lambda chain in RT and RACE libraries.

SPECIFIC IMPLEMENTATIONS

In order to further elaborate the technical means adopted by theinvention and its effect, the technical scheme of the invention isfurther explained by combining with the drawings and by concreteembodiments, but the invention is not limited to the scope of theembodiment.

Embodiment 1 An Optimized High-Throughput Library Construction Methodfor Few Samples

The concrete steps to build the library are shown in FIG. 2. Samplepreparation and first chain cDNA synthesis: Peripheral blood mononuclearcells (PBMC) are isolated from human or animal peripheral blood bydensity gradient centrifugation. Total RNAs are isolated from PBMC byTrizol. 20 ng, 100 ng and 600 ng RNAs are used as template and oligo(dT) is used as primer for the synthesis of first-chain cDNA. Theconcrete operation can be carried out by referring to the manual ofSMARTer® RACE 5′/3′ Kit.

The optimized two-step PCR method is used to construct the antibodylibrary. In the first round of PCR amplification, the forward primer isthe optimized SMARTer® RACE 5′/3′ Kit UPM with the partial joint primercontaining Illumina, the reverse primer containing the sequence of theinvariant region of the antibody and the partial join primer containingIllumina. VH, VK and VL are amplified respectively. The reaction systemis

Reaction System Component Volume PCR 1 × TransStart ® FastPfu Fly   5 μLreaction Buffer Optimized UPM   5 μL Reverse primer 0.2 μL TransStart ®FastPfu Fly DNA 0.5 μL Polymerase cDNA   5 μL dNTPs  0.2 mM Total volume50PCR reaction conditions are as follows:

Reaction program Cycles number Amplification 95° C. 2 min  1 program 95°C. 30 s 25 67° C.-62° C. 30 s 72° C. 30 s 72° C. 7 min  1

The first round of PCR products are recovered by magnetic beads.

The primers of the second round of PCR is conventional Illumina libraryconstruction primer, and the reaction system is as follows:

Reaction System Component Volume PCR 1 × TransStart ® FastPfu Fly   5 μLreaction Forward primer 0.2 μM Reverse primer 0.2 μM TransStart ®FastPfu Fly DNA 0.5 μL  Polymerase First round PCR recovered product 20μL dNTPs  0.2 mM Total volume 50

PCR reaction conditions are as follows:

Reaction program Cycles number Amplification 95° C. 2 min 1 program 95°C. 30 s 6-8 60° C. 30 s 72° C. 30 s 72° C. 7 min 1

The steps of gel cutting and purification of products can be referred tothe manual of QIAquick Gel Extraction Kit. After the library isconstructed, Bioanalyzer High Sensitivity DNA chip is used for QC,sequencing platform is Illumina Miseq 2x300, the result is shown in FIG.3(a)-(f).

The results of FIG. 3(a)-(f) show that the frequency correlation R2values of heavy chain, Kappa chain and Lambda chain CDR3 are 0.83, 0.90and 0.98 respectively at the 20 ng RNA input, comparing with 600 ng RNAinput, it shows a high correlation. And the figure shows the ranking ofhigh frequency CDR3, For example, CDR3, which ranks the top 10 in thegene spectrum of the variable region of antibody, also shows a highconsistency at 600 ng, 100 ng and 20 ng RNA inputs. The comparison ofCDR3 frequency correlation explains that the RACE library constructedwith 20 ng RNA input can meet the requirements of antibody librarysequencing in terms of antibody gene coverage and CDR3 region bias.

Embodiment 2 An Optimized Antibody Gene Screening Method for DengueVirus Monoclonal Antibody

Use the method of RACE antibody library construction described inembodiment 1, total RNAs are isolated form PBMC of dengue patients inacute stage and convalescence stages, and construct the RACE librariesof heavy chain, Kappa chain and Lambda chain respectively. Use IlluminaMiseq 2x300 system to sequence the library: the heavy chain library issequenced alone, and the Kappa and Lambda chain libraries are combinedand sequenced. The sequencing results are processed by softwareTrimmomatic-0.30, the parameters are set as follows: phred33,LEADING:20, TRAILING:20, SLIDINGWINDOW:20:20, MINLEN:200). Remove poorquality data. The following improvements are made to data analysis.Comparing the data amount in the antibody library after removing andpreserving Singleton sequence, it is found that the data amount changesgreatly after Singleton is removed, so Singleton sequence cannot beremoved. And the way of repeated sequence normalization can reduce theoccupation of server resources: first, find out each sequence and itsrepeat times, then analyze each sequence by IgBlast, and obtain eachCDR3 frequency by counting the repeat times.

TABLE 1 Comparison of data amount of antibody library before and afterremoving Singleton sequence Reads amount before Reads amount afterremoving singleton removing singleton 6H-R1 30179 4573 6H-R2 30179 4716K-R1 574845 141848 6L-R1 6K-R2 574845 16785 6L-R2

TABLE 2 Comparison of data amount of antibody library before and afterrepeated sequence normalization Reads amount before Reads amount afterrepeated sequence repeated sequence normalization normalization 6H-R130179 27278 6H-R2 30179 29913 6K-R1 574845 463091 6L-R1 6K-R2 574845563921 6L-R2

TABLE 3 Comparison of library quality of dengue patients in acute andconvalescent stages Sequence amount Q30 index Sequence after afterLibrary amount organization organization Convalescent 2365300  81229481.90 stage H Convalescent 1569720  647412 82.88 stage K Convalescentstage L Acute stage H 4721106   60358 83.98 Acute stage K 17262721149690 82.45 Acute stage L

Because CDR3 sequences are almost unique markers for different antibodygenes, the abundance of CDR3 sequences can be evaluated in the genespectrum of the variable region of antibodies according to the number ofobtained CDR2 sequences. The sequences of FR region and CDR region aredetermined according to IgBlast website, but the sequence of CDR3 regionis not determined by IgBlast website, so we use characteristic sequenceof FR4 region to determine CDR3 amino acid sequence, and thecharacteristic sequence of heavy chain FR4 initiation region is WG*G orGQG, characteristic sequences of Kappa and Lambda chains are FG*GT. RunIgBlast respectively for the Read1 and Read2 sequences and merge theresults, and select the pairing candidate CDR3 sequence based on one orall of the following principles: 1) High frequency sequence after CDR3clustering; 2) Select the CDR3 sequences which are significantly higherin after-immunization or outbreak sage than those in thepre-immunization or convalescent stage; 3) The V gene familycorresponding to CDR3 is significantly different between that ofafter-immunization or outbreak stage and that of pre-immunization orconvalescent stage.

The CDR3 frequency changes of heavy chain and light chain of denguepatients in acute and convalescent stages are show in tables 4, 5, and6. The results show that the concentration of CDR3 in acute stage issignificantly higher than that in convalescent stage, indicating that itmay be related to specific antigen.

TABLE 4  Analysis of the CDR3 frequency ofheavy chain of dengue patients in the acute and convalescent stagesFrequency of Frequency convalescent CDR3 Heavy chain of acute stageCluster CDR3 sequence stage (%) (%) (%) ARALSEKSLT 8.53 0.11 9.76 TSYLDCVKDASTTSIG 5.97 0.03 10.04 AAPFDY THRRPSLRYP 5.45 0.15 5.62 DVASLGGTVTDA 3.13 0.03 5.55 FDL AKDASTTSKG 2.49 0 3.14 AAPFDY ARGFTYGHYF2.23 0 2.24 DY VKDASTTSIG 2.05 0 10.04 AAPFDN AKTVVTAPGV 1.86 0 2.15 FDYASDILDAFDV 1.84 0 1.85 VKDASTTSIG 1.72 0 10.04 AAPFDS ASLGGTVTDA 1.21 05.55 FDV ASLGGTVTDA 1.15 0 5.55 FDI ATGYTYGYYF 1.00 0 1.01 DY

TABLE 5  Analysis of CDR3 frequency of Kappachain of dengue patients in the acute and convalescent stages FrequencyFrequency of CDR3 Kappa chain of acute convalescent ClusterCDR3 sequence stage (%) stage (%) (%) QQANSFPWT 5.94 0.06 6.31 HQYSSWPPG4.87 0.18 5.22 GT HQYNSWPPG 2.83 0 3.04 GT HQYNAWPPG 2.81 0 3.78 GTMQGTHWRS 2.63 0 2.68 QQYSFWPWT 2.45 0 2.56 QQYDSFPLT 1.68 0 1.75QQANNFPWT 1.57 0 1.66 HQYTSWPPG 1.28 0 1.37 GT QQRSNWPRT 1.24 0.01 1.3QQISNFPIT 1.17 0 1.2 QQSFSPPWT 1.09 0.07 1.17 QQSGSSLH 1.02 0 1.27QQRSTWPYT 1 0.07 1.09

TABLE 6  Analysis of CDR3 frequency ofLambda chain of dengue patients in the acute and convalescent stagesFrequency of Frequency convalescent CDR3 Lambda chain of acute stageCluster CDR3 sequence stage (%) (%) (%) CSYAASYYDT 10.08 1.06 10.96 GVSSYRSISPFYV 5.2 0.64 5.45 SSYRSSSPFYV 5.06 0.52 5.18 QVWDRSTNHR 4.490.97 4.53 V QVWDRSSDHR 3.69 0.61 3.82 L QSYDTSLRAG 3.58 0.65 3.67 VSSYTGSSTV 2.06 0.19 2.06 LLSYGGAPCV 1.5 0.27 1.58 CSYAGRSTWV 1.48 0.041.53 QSFDDSLSGW 1.38 0.24 1.4 V QVWDRSTNHR 1.26 0.15 1.28 L CSYAGSNTWI1.24 0.21 2.55 CSYAGSNTWV 1.16 0.17 2.55

Anchor candidate CDR3 sequences, take the heavy-chain CDR3 sequenceTHRRPSLRYPDV as an example, run all Read2 and corresponding Read1containing the CDR3 homology cluster respectively by IgBlast to obtainthe amino acid sequences of CDR region and FR region, such as CDR1,CDR2, CDR3, FR1, FR2, FR3, and FR4. The results of IgBlast analysis arecombined into the R12 file. In parallel, the sequencing results of R1and R2 are spliced, and the amino acid and nucelotide sequences of eachdomain are determined by running IgBlast with the spliced sequences, andthe Contig file is obtained.

The selection of full length sequence of the variable region follows thefollowing steps: taking the heavy chain. CDR3-THRRPSLRYPDV homologouscluster as an example, the highest frequency variable region of theantibody gene is screened and obtained from the R12 file. The amino acidsequence of each domain are as follows, as shown in Table 7:

FR1(QITLKESGPMLVKPTQTLTLTCTFS)- CDR1(GFSLSTSGVG)-FR2(VGWIRQPPGEALEWLAI)-CDR2(IYWDDDK)- FR3(RYSPSLRSRLTISKDTSKNQVVLTMTNLDPVDTATYFC)FR4(WGQG).

Further, the corresponding nucleotide sequences are selected in turnfrom Contig file according to the sequence ofnnFR4-nnCDR3-nnFR3-nnCDR2-nnFR2-nnCDR1-nnFR1-contig. The correspondingfull-length nucleotide sequence of the antibody variable region of thehighest frequency nucleotide sequence is obtained, which should containthe complete FR and and CDR regions and the signal peptide sequence.

TABLE 7  Heavy chain CDR3 THRRPSLRYPDVhomologous clusters corresponding tocombinations of the highest frequency FR and CDR regionsHeavy chain-THRRPSLRYPDV-R12 highest frequency amino acid sequence CDR1GFSLSTSGVG FR1 QITLKESGPMLVKPTQTLTLTCTFS FR3RYSPSLRSRLTISKDTSKNQVVLTMTNLD PVDTATYFC FR4 WGQG FR2 VGWIRQPPGEALEWLAICDR2 IYWDDDK

Take Kappa chain —HQYSSWPPGGT homologous cluster as an example, the mostabundant sequence combinations of FR and CDR regions are screened andobtained from the R12 file. The amino acid sequence of each region is asfollows:FR1(EIVMTQSPATLSVSPGERATLSCRAS)—FR2(LAWYQHKPGQAPRLLLY)—FR3(TRAAGIPDRFSGSGSGTEFTLTISSLQSEDFAVYFC)—CDR2(GAS)—CDR1(QSVSSN)—FR4(FGQGT), refer to FIG. 8. The corresponding highestfrequency nucleotide sequence is selected in turn from the Contig fileaccording to the sequence ofnnFR4-nnCDR3-nnFR3-nnCDR2-nnFR2-nnCDR1-nnFR1-contig, the correspondinghighest frequency full-length DNA sequence of the antibody variableregion is obtained.

TABLE 8  Kappa chain CDR3 HQYSSWPPGGThomologous clusters corresponding tocombinations of the highest frequency FR and CDR regionsKappa-HQYSSWPPGGT-R12 highest frequency amino acid sequence CDR1 QSVSSNFR1 EIVMTQSPATLSVSPGERATLSCRAS FR3 TRAAGIPDRFSGSGSGTEFTLTISSLQSED FAVYFCFR4 FGQGT FR2 LAWYQHKPGQAPRLLLY CDR2 GAS

This selecting method of sequence of antibody variable region has thefollowing benefits: while sequencing data are processed, Singleton readsare retained; the splicing rates of variable regions of antibody genescorresponding to different high frequency CDR3 are significantlydifferent, the full length splicing rate of heavy chain high frequencyCDR3 sequences in the following table ranges from 34% to 91% (Table 9).If the splicing sequence analysis is used directly, in some cases thedata loss may be huge and the CDR3 frequency analysis may even bebiased. For current process, R1 and R2 sequences are used for CDR3composition analysis at first, which avoids the risk of effective dataloss and also helpful to obtain accurate frequency of CDR3 sequences;for current process, the highest frequency amino acid sequencesinformation of FR and CDR regions are obtained from R1 and R2 files, andthe highest frequency nucelotide sequences of variable to region ofantibody gene are selected from Contig sequence file, it can effectivelyguarantee the sequences of variable regions are the highest frequencysequences, and guarantee the integrity and accuracy of the sequenceinformation. A separate selection of sequences FR and CDR region in theR12 file also verifies the accuracy of the current process as show inFIG. 4.

TABLE 9  Analysis of sequencing and splicing ofvariable region of antibody Splicing Heavy chain Reads Reads SplicingCDR3_Seq amount amount rate(%) ARALSEKSL 2394 810 33.8 TTSYLDC VKDASTTSI1676 730 43.6 GAAPFDY THRRPSLRY 1531 1387 90.6 PDV ARGFTYGH 628 415 66.1YFDY

Analysis of antibody gene pairing and expressing: an expression frame isconstructed by overlapping PCR, and the expression frame includespromoter, variable region and antibody constant region. The selectedantibody heavy chain gene and light chain gene are combined and paired,then transfected 293FT cells for expression. The supernatant is obtainedfor Elisa assay and tested its affinity with antigen. Concreteimplementation steps are as follows: The variable regions of candidateheavy chain and light chain are constructed expression frame by usingthe method of bridging PCR. The expression frame includes promoter,variable region and constant region of human antibody. The plasmidscontaining heavy chain and light chain expression frames are paired andtransfected 293FT cells for vitro expression and antibody analysis. 293Tcells for vitro expression and antibody analysis. 293T cells arecollected, adjusted cell density and inoculated with 28-hole plate1.2×10 ⁵/hole; after cultivation in an incubator at 37° C. and 5% Co2for overnight, the cells are transfected when the cell density reaches60-80%. The plasmids of 0.25 ug heavy chain and 0.75 ug light chain areincubated at room temperature for 5 minutes, mixed with the transfectionreagent then cultivated at room temperature for 20 min to form thetransfection complex. The transfection complex is added into the cellpore and mixed softly, then cultivated in an incubator at 37° C. and 5%CO2 for 72 hours. The supernatant is collected and detected activity byElisa. First, an anti-human IgG(Fc) and a detection antigen are dilutedwith pH 9.6 carbonic acid coating solution to 10 μg/mL, 96-wellmicrotiter plate coated with 100 μL at 4° C. for overnight; or at 37° C,2 hours; then sealed off by 4% skim milk powder-PBS and 300 μL/well,treated for 1˜2 h at 37° C. The liquid in microtiter plate is discarded,the rest is washed with PBST for three times, then added transfectionand cultivated for 48 hours, the supernatant of 100 μl/well is treatedat 37° C. for 1 h. The culture medium and PBS control are set up. Theliquid in the microtiter plate is discarded and the rest is washed withPBST for three times, and HRP goat anti-human IgG(Fc) 1: 2000 and HRPgoat anti-human IgG 1: 5000 are added respectively. then 100 μl/well istreated at 37° C. for 1 h. The liquid in the microtiter plate isdiscarded, the rest is washed with PBST for five times, and the OPDchromogenic solution is added, 100 μL/well, avoid light for coloration;The absorption value of OD490 wavelength is read by Microplate Reader.

Four heavy chains are paired with five Kappa chains and four Lambdachains respectively. Elisa results (FIG. 4) show that there is nopositive result in the pairing of kappa chains and heavy chains, andfive positive clones are obtained from the pairing of lambda chains andheavy chains. The positive rate is 5/36.

Embodiment 3 An Optimized Antibody Gene Screening Method for Flu VirusMonoclonal Antibody

Use the method of RACE antibody library construction described inembodiment 1, total RNAs are isolated from PBMC of volunteers before and7 days after the injection of influenza vaccine, and construct the RACElibraries of heavy chain, Kappa chain and Lambda chain respectivelyaccording to the method described in embodiment 2. Use Illumina Miseq2x250 system to sequence high-throughput. The methods of dataprocessing, the determination of FR and CDR regions, and the selectionof candidate CDR3 amino acid sequences are the same as those describedin embodiment 1.

TABLE 10  CDR3 frequency analysis of heavychain before and after immunization of influenza vaccine FrequencyFrequency Heavy of 7^(th) day before Frequency  chain after immuni-change of CDR3_ immuniza- zation V V gene Seq tion (%) (%) gene (%) MSW4.84 0.07 IGHV2- 100% NDRV 70*13 VAP SVVS 4.17 0.05 IGHV3-  75% FPPY74*01 EVGG 2.88 0.06 IGHV3-  25% ERAY 7*01 DRDA 1.95 0.03 IGHV1-   0%SGDF 3*01 DI APGG 1.21 0.01 IGHV3-  25% QWA 7*01 Y SSVV 0.97 0.03 IGHV3- 75% SFPP 74*01 Y DRVT 0.78 0.02 IGHV3-   0% GDNF 23*01 YYY MGV GSTI0.76 0.01 IGHV3- 100% MVTL 53*01 LQFF 0.67 0 IGHV2- 100% EGRH 70*13 MDVDRVA 0.47 0.02 IGHV3-   0% SGDF 23*01 DI LLSG 0.4 0.01 IGHV2- 100% GENP70*01 SYYY HMD V KTYG 0.35 0.01 IGHV4-  40% SGSF 59*01 DYFD Y GATV 0.330 IGHV3- 100% VNDL 53*01 EY GGTI 0.33 0.01 IGHV3- 100% RVTL 53*01 MNW0.33 0 IGHV2- NDRV 70 VDP GGTI 0.32 0 IGHV3- 100% MVTL 53*01 DYLS 0.32 0IGHV1- 100% GTYT 24*01 PPLY ERGE 0.28 0 IGHV4-  40% VGDS 59*01 VDNF YYYMDV PMTY 0.28 0.01 IGHV4-   0  YYDI 4*02 SDAG AYYF DT PPTA 0.28 0.01IGHV4-  22% HHFN 39*01 AFYI

TABLE 12  CDR3 frequency analysis of Lambdachain before and after immunization of influenza vaccine FrequencyFrequency Lambda of 7^(th) Frequency change chain- day after before of VCDR3_ immuniza- immuniza- V gene Seq tion (%) tion (%) gene (%) AAW16.83722 0.23 IGLV1- 100% DDDL 47*01 SGPV AAW 2.558374 0.03 IGLV1-  20%DDSQ 44*01 NGPL QT 1.711464 3.48 IGLV4- −100%  69*01 ATWD 1.589042 0.02IGLV1- 100% DNLS 47*01 GPV QV 1.483721 0 IGLV3- 100% 21*01 AAW 1.2692790.01 IGLV1- 100% DDNF 47*01 SGAE V SSYTS 1.175902 0.05 IGLV2-   0% SSTP14* WV 01/03 AVW 1.161244 0.01 IGLV1-  20% DNSL 44*01 NGFY V AAW1.099626 0.01 IGLV1- 100% DDSL 47* SPPE 01/02 V ATWD 1.037193 0.01IGLV1- 100% DDLS 47* GPV 01/02

TABLE 11  CDR3 frequency analysis of Kappachain before and after immunization of influenza vaccine Frequency Kappaof 7^(th) day Frequency Frequency chain after before change CDR3_immuniza- immuniza- V of V Seq tion (%) tion (%) gene gene (%) QQRS2.993274 0.01 IGKV3-  −9% DWP 11*01 YT QQY 2.178924 0.01 IGKV1- 100% DA33*01 HQSS 1.560191 0.01 IGKV1-  21% IRSW 39*01 T LQHN 1.399475 0.01IGKV1D- 100% TYPQ 17*02 T QQSS 1.274091 0 IGKV1-  21% TSSW 39*01 T QQY1.19955 0.48 IGKV3- −31% NNW 15*01 PPYT MQG 1.103035 0.01 IGKV2- 100%KYW 30*01 PT QQY 1.060809 0 IGKV3-  50% RGSS 20*01 CT QQY 1.020738 0.07IGKV3-    0.5 GSSP 20*01 PWT QQY 0.998763 0 IGKV3-    0.5 DYSS 20*01 ST

The expression frame is constructed by fusion of PCR; the frame includespromoter, variable region and antibody constant region.

The selected antibody heavy chain gene and light chain gene are randomlypaired and transfected 293FT cells for expression and the supernatant isobtained for Elisa assay, its affinity to antigens is tested.

The concrete methods for expression in vitro can be referred to denguevirus cases. Elisa results show that one functional antibody per pair isobtained form the pairing of candidate Kappa chain and Lambda chain withheavy chain respectively. From FIG. 5, we can see that the success rateof pairing is 2/15.

Embodiment 4 Full-Length Sequences of Variable Region of AntibodyObtained by Combining the Sequencing Results of RACE and RT libraries.

The optimized full-length sequence selection method of the variableregion described in embodiment 2 may partially make up for theinsufficient reading length of the sequence. But for partial antibodyvariable region, such as containing long length of CDR and FR regions of5′ UTR may lead to low splicing rate. It is difficult to obtain the fulllength sequence of the high frequency variable region by the describedselection in turn method with locking CDR3. For example, in foursequences of highest-frequency CDR3 in Table 9, the correspondingsplicing rate of two CDR3 is about 40%.

A method is developed to obtain the full-length sequence of variableregion of antibody by combining the results of RACE and RT librariessequencing. The RACE and RT libraries sequencing. The RACE and RTmethods are used to construct library and sequence for the same samplein parallel, and the consistency of high-frequency CDR3in the twolibraries is compared. The CDR3 homology cluster is locked and themaximum frequency variable region full length DNA sequence containingthe CDR3homology cluster is selected from the RT library.

The method of concrete implementation is as follows:

The RT method and the RACE method described in embodiment 1 are used toseparate RNA from PBMC of blood sample of dengue patient at acute stage;construct heavy chain and light chain libraries by RT method at the sametime.

The steps of RT library construction are as follows:

According to the method described in the embodiment 1, the first chainof cDNA is obtained, and the specific primers are used to amplifyvariable regions of heavy chain (VH), Kappa chain (VK) and Lambda chain(VL). The forward primer is designed in the conserved region of thevariable region, and the reverse primer is designed in the constantregion. The primer sequences are shown in Table 13, in which heavy chainprimers are quoted from REF1.

Adopt the 25 μL PCR amplification system consisting of 2.2 μM VH or 3.6μM VK or 2.2 μM VL forward primers, 1 μM reverse primers, 2.5 μL cDNAand 0.75 μL AccuPrime Taq Polymerase.

PCR program: 95° C. min; 25 cycles (95° C. 30 sec, 56° C. 30 sec, 72° C.1 min); 72° C. 7 min; 4° C. The operation steps of gel cutting andpurification of PCR products can be referred to the manual of QIAquickGel Extraction Kit (Qiagen).

The construction of high-throughput sequencing library can be referredto the manual of NEBNext® Ultra™ DNA Library Prep Kit for Illumina(NEB). After the library construction is finished, Bioanalyzer HighSensitivity DNA chip (Agilent) is used for QC and the sequencingplatform is Illumina Miseq 2x300.

TABLE 13  List of primers used in RT Library Construction VH chain IGHV_CGCAGACCCTCTCA forward LR1 CTCAC primers IGHV_ TGGAGCTGAGGTGAA LR2 GAAGCIGHV_ TGCAATCTGGGTCTG LR3 AGTTG IGHV_ GGCTCAGGACTGGTG LR4 AAGC IGHV_TGGAGCAGAGGTGAA LR5 AAAGC IGHV_  GGTGCAGCTGTTGGA LR6 GTCT IGHV_ACTGTTGAAGCCTTC LR7 GGAGA IGHV_ AAACCCACACAGACC LR8 CTCAC IGHV_AGTCTGGGGCTGAGG LR9 TGAAG IGHV_ GGCCCAGGACTGGTG LR10 AAG IGHV_GGTGCAGCTGGTGGA LR11 GTC VH chain HGR AAGACCGATGGGCCC reverse primersTTG VK chain kvf1 TGACCCAGTCTCCAT forward CCTCC primers kvf2TGACCCAGTCTCCAT CCTCA kvf3 TGACCCAGTCTCCAT CCTTCC kvf4 TGACCCAGTCTCCATCCTTACT kvf5 TGACCCAGTCTCCAT CTGCC kvf6 TGACCCAGTCTCCAT CTTCC kvf7TGACACAGTCTCCAG CCACC kvf8 TGACACAGTCTCCAG GCACC kvf9 TGACGCAGTCTCCAGGCAC kvf10 TGACCCAGTCTCCAG ACTCC kvf11 TGACTCAGTCTCCAC TCTCCC kvf12TGACCCAGTCTCCAT TCTCCC kvf13 TGACCCAGACTCCAC TCTCTC kvf14TGACCCAGACTCCAC TCTCC kvf15 TGACCCAGTCTCCTTC CACC kvf16 TCACGCAGTCTCCAGCATTC kvf17 TGACGCAGTCTCCAG CCAC kvf18 TGACCCAGTCTCCAT CTTCTG VK chainKCR ACACAACAGAGGCAG reverse  TTCCAG primers VL chain lvf1GGTCCTGGGCCCAGT forward CTGTCG primers lvf2 GGTCCTGGGCCCAGT CTGCC lvf3GTCCTGGGCCCAGTC TGTGCT lvf4 TGTCAGTGGTCCAGG CAGGGC lvf5 GATCCTGGGCTCAGTCTGCCCTG lvf6 GCTCTGAGGCCTCCT ATGAGCTG lvf7 GATCCGTGGCCTCCT ATGAGCTGlvf8 TCTCTGAGGCCTCCT ATGAGCTG lvf9 GCTCTGCGACCTCCT ATGAGCTG lvf10GTTCTGTGGTTTCTTC TGAGCTGAC lvf11 GCTCTGTGACCTCCT ATGTGCTG lvf12TCTCTGTGGCCTCCTA TGAGCTG lvf13 GTTCTGTGGCCTCCTA TGAGCTG lvf14GTCTCTGTGCTCTGCC TGTGCTG lvf15 GGTCTCTCTCCCAGC CTGTGCTG lvf16GGTCTCTCTCCCAGCT TGTGCTG lvf17 GTTCCCTCTCGCAGC CTGTGCT lvf18GTTCCCTCTCGCAGG CTGTGCT lvf19 GGTCCAATTCTCAGA CTGTGGTGAC lvf20GGTCCAATTCCCAGG CTGTGGTG lvf21 GAGTGGATTCTCAGA CTGTGGTGAC lvf22GGTCCCTCTCCCAGC CTGTGC VL chain LCR AGTGTGGCCTTGTTG reverse primersGCTTG

The library sequencing and analysis are carried out according to themethod described in embodiment 1, the correlation of high frequency CDR3region between RACE library and RT library is compared, see FIG 6(a) and6(b) for the analysis of CDR3 frequency correlation between RACE libraryand RT library.

The results show that the correlation R² value of heavy chain and Lambdachain CDR3 in the two libraries is about 0.7; as show in FIG. 6(a), theconsistency of the highest frequency heavy chain and light chain CDR3isvery good, the high frequency heavy chain and light chain CDR3 in thetop 10 of antibody spectrum also show some consistency, indicating thatthere is no significant deviation in most of the high frequency CDR3rankings in the two libraries. For example, the highest frequency CDR3(ARETDGMDV) of heavy chain of the sample 11-2 in FIG. 6(b) is thehighest frequency CDR3 in both RACE and RT libraries. The amino acidsequences of the most abundant FR and CDR regions of the CDR3-ARETDGMDVhomology cluster selected by using the RACE R12 file areFR1:QVQLVQSGAEVKRPGASVKVSCKAS, FR2: MHWVRQAPGQRLEWMGW, FR3:KYSQKFQGRVTITRDTSASTAYMELSSLRSEDTAVYYC,CDR1:GYTFTTYA,FR4:WGQG,CDR2:INAGNGNT. The amino acid sequence of the most abundant FR regionand CDR region of CDR2-ARETDGMDV homology cluster selected by RT fileare also consistent, which indicates that this method can be used tosplice the variable region of high frequency CDR3 gene.

Embodiment 5 An Optimized Antibody Gene Screening Method for HAMonoclonal Antibody

Use the method of RACE antibody library construction described inembodiment 1, total RNAs are isolated from spleen cells of miceimmunized with HA antigen, and construct the RACE libraries of heavychain and light chain respectively. Use Illumina Miseq 2x250 system forhigh-throughput sequencing. The methods of data processing, thedetermination of FR and CDR regions, and the selection of candidate CDR3amino acid sequences are the same as those described in embodiment 1.

The expression frame is constructed by fusion of PCR; the frame includespromoter, variable region and antibody constant region. The selectedantibody heavy chain gene and light chain gene are randomly paired andtransfected 293FT cells for expression and the supernatant is obtainedfor Elisa assay, its affinity to antigens is tested.

The concrete methods for expression in vitro can be referred to denguevirus cases. Elisa results show that one affinity antibody is obtainedfrom the pairing of candidate heavy chain and light chain.

TABLE 14  Analysis of heavy chain CDR3frequency after HA immunization in mice Frequency afterHeavy chain CDR3_Seq immunization (%) TRGDY 5.94 ARHTIPPYVMDY 1.56ARDEGIYGY 1.48 ARGVYNYGRVWYFDV 1.36 ARRDYDNYVPFAY 1.25 TGDYEFGLFDY 1.24ARLSGTFAY 1.01 ASLKGSAY 0.92 AREGGYYFDY 0.91 ARDNGHDWFAY 0.87ARRDYGNYVPFAY 0.76 VLDYYGYAPFAY 0.75 ARDLYYSHGGFAY 0.70 ARVDGYLQGYYFGY0.69 ARGREGNGAMDY 0.65 ARQEFYYGNYDAMDY 0.61 ASGILNVMDY 0.58 ARATVPAEIAY0.57 ARWTGTGDYAMDY 0.56 ARSGLIYDGYYAWFAY 0.53

TABLE 15  Analysis of Kappa chain CDR3frequency after HA immunization in mice Frequency after Kappa chainimmunization (%) WQGTHFPWT 4.62 QNGHSFPYT 4.11 QQYYSYPRT 3.86 QQYYRYPWT1.97 QQYYNYRT 1.92 MQHLEYPFT 1.83 QQYYSYPWT 1.49 KQSYNLLT 1.31 LHYDNLWT1.22 QQYYSYRT 1.11 LQYDNLLT 1.10 MQHLEYPYT 1.04 QNDHSFPLT 0.96 SQSTHVPPT0.89 FQGSHVPWT 0.85 WQGTHFPQT 0.84 QQWSSNPFT 0.79 QQWSSNPPT 0.79HQWSSYRT 0.78 KQSYNLWT 0.76

The applicant declares that the detailed method of the invention isexplained by the above embodiments, but the invention is not limited tothe above detailed methods, that is, it does not mean that the inventionshould rely on the above detailed methods for implementation.Technicians in the technical field should understand that anyimprovement in the invention, the equivalent replacement of the rawmaterials of the products of the invention and the addition of auxiliarycomponents, and the selection of concrete ways, etc., fall within theprotection scope and the public scope of the invention.

1. A method for identifying specific functional antibodies comprising,:(1) Extracting total RNA from at least one target antigen immunizedhost, and constructing an antibody sequence library therefrom; (2)Performing high-throughput sequencing of variable regions ofimmunoglobulin genes present in said library thereby obtaining sequencesof the antibody variable region genes for the target antigen; (3)Selecting Candidate CDR3 homologous clusters from Read1 and, or Read2and corresponding heavy chain and light chain antibody nucleic acidsequences and pairing said heavy chain and light chain sequences withsaid candidate CDR3 homologous clusters from the variable region genesequences obtained; (4) Expressing said CDR3, heavy and light chainencoding sequences in vitro thereby producing candidate recombinantantibodies comprising said CDR3 homologous cluster
 2. The method asclaimed in claim 1, wherein said antibody sequence library is generatedby RACE and/or RT library construction.
 3. The method according to RACElibrary is constructed and cDNA is synthesized by reverse transcriptionusing oligo (dT) as primer and said isolated total RNA as a template;said antibody sequence library being produced by using said synthesizedcDNA as template.
 4. The method according to claim 2, where an RTlibrary is constructed and cDNA is synthesized by reverse transcriptionusing (b′) oligo (dT) as primer and said isolated total RNA as template,said antibody sequence library being produced with specific primerseparate amplification using said synthesized cDNA as template.
 5. Themethod according to claim 1, wherein said total RNA is isolated from atleast two combinations of mammals, amphibians, fish or birds.
 6. Themethod as claimed in claim 2, wherein said the RACE library constructionincludes a common PCR amplicon library construction method and atwo-step PCR library construction method.
 7. The method as claimed inclaim 1, wherein the high-throughput sequencing is performed by one ormore of sequencing by synthesis, sequencing by joining, sequencing byhybridization, single molecule DNA sequencing, multiple polymerasecommunity sequencing or nano-pore sequencing, then selected assequencing by synthesis, and further selected as Illumina platformsequencing.
 8. The method as claimed in claim 1, wherein said candidateCDR3 is selected from any or at least two combinations of high frequencysequences exhibiting CDR3 clustering, selection of CDR3 sequences withsignificantly higher frequency after immunization or outbreak phase thanin pre-immunization or convalescence phase, or a sequence of V genefamily corresponding to CDR3 after immunization or outbreak phasesignificantly different from that in pre-immunization or convalescencephase.
 9. The method according to claim 1 wherein selection of candidateCDR3 in the step (3) and corresponding heavy chain and light chainantibody nucleic acid sequences as candidate pairing sequence comprises(1′) Selecting candidate CDR3 homologous clusters having a homologyrange between, 70% to 99%; (2′) Anchoring the CDR3 homology cluster, andthe full length amino acid sequences of the high frequency antibodyheavy chain and the light chain variable region to form candidatesequences in the gene spectrum of the variable region of antibody; and(3′) Determining nucleic acid sequences of first and second pairingcandidate sequences,
 10. The method as claimed in claim 1, wherein themethod also includes the identification of the screened antibody usingat least one target antigen:
 11. The method of claim 5, wherein saidtotal RNA is obtained from a mammal selected from any, or at least twocombinations of humans, mice, primates, rabbits, goats, sheep or pigs.12. The method of claim 11, wherein said total RNA is obtained from atleast two of peripheral blood, lymphoid organs, spleen, bone marrow orliver.
 13. The method of claim 1, wherein said total RNA is obtainedfrom two or more cells selected from memory B cells, plasma cells, orplasmablasts.
 14. The method of claim 6, wherein an antibody gene isamplified by a first round of PCR, and the amplification product is usedfor DNA library construction.
 15. The method of claim 14, wherein theprimers for PCR used in the first round of two-step PCR amplificationinclude partial forward primer of SEQ ID NO: 1 and a partial reverseprimer selected from SEQ ID NO.2-4, wherein the partial forward jointsequence optionally includes 5-60 bp of the 3′ end of the Illuminaforward joint primer, and the partial reverse joint sequence optionallyincludes 5-60 bp of the 3′ end of the Illumina reverse joint primer. 16.The method of claim 14, wherein said forward joint primer is SEQ IDNO.5, and the reverse joint primer is SEQ ID NO.6.
 17. The method ofclaim 10, wherein scFv or Fab fragments or IgG of the antibody areexpressed in vitro, and the resultant scFv, Fab fragments or IgG withthe target antigen are subjected to a binding force test.
 18. The methodof claim 17, wherein said scFv or Fab fragments or IgG of the antibodyare expressed in prokaryotic cells, phage or yeast, or mammalian cells.19. The method of claim 10 wherein the antibodies are identified usingELISA and/or SPR.